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Abstract. Binary Hashing is widely used for effective approximate near¬ 
est neighbors search. Even though various binary hashing methods have 
been proposed, very few methods are feasible for extremely high-dimensional 
features often used in visual tasks today. We propose a novel highly sparse 
linear hashing method based on pairwise rotations. The encoding cost of 
the proposed algorithm is O(nlogn) for n-dimensional features, whereas 
that of the existing state-of-the-art method is typically O(n^). The pro¬ 
posed method is also remarkably faster in the learning phase. Along with 
the efficiency, the retrieval accuracy is comparable to or slightly outper¬ 
forming the state-of-the-art. Pairwise rotations used in our method are 
formulated from an analytical study of the trade-off relationship between 
quantization error and entropy of binary codes. Although these hashing 
criteria are widely used in previous researches, its analytical behavior 
is rarely studied. All building blocks of our algorithm are based on the 
analytical solution, and it thus provides a fairly simple and efficient pro¬ 
cedure. 


L Introduction 


Approximate nearest neighbors (ANN) search is widely used in retrieval |ll2l3l4j . 
and the scale of its database has been increasing rapidly in recent times. Fur¬ 
thermore, to achieve more accurate retrieval results, high-dimensional features 
such as Fisher Vectors m and VLAD [7] are being used in the computer vision 
community. To achieve feasible retrieval with such features, highly efficient ANN 
search methods is necessarily needed. 

Vector Quantization based methods are widely used and actively studied 
for ANN. For high-dimensions. Product Quantization [S] and its family are the 
state-of-the-art methods [S] . It reduces high-dimensional vector space into direct 
product of small subspaces. Then a clustering is applied for each subspace to 
obtain representative vectors (quantizers). Although product quantization based 
methods are applicable to high-dimensional features, it is still not easy to obtain 
good quantizer in some cases, and a random rotation often needed before PQ is 
expensive in high-dimensions. And the floating-point distance calculation needed 
for retrieval is also expensive compared to binary-based methods m- 

Binary hashing is one of the most commonly used techniques for efficient 
retrieval 111112113114) . recognition [15], and other problems [7|T6]. It is a series 
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of methods that transforms real-valued feature vectors into binary-valued ones. 
Binary-valued vectors are highly favorable for large-scale or high-dimensional 
tasks because they provide high memory efficiency and fast Hamming distance 
calculation. There are a lot of methods proposed. Major approaches are catego¬ 
rized as Vector Quantization (VQ) based methods |17I18I19T5] . hyperplane based 
linear methods 1201211101 , and nonlinear hashing function methods |22I23I24I25126] . 

A typical nonlinear method is Spectral Hashing , whose hashing functions 
are nonlinear eigenfunctions derived from a distribution of data. Some family of 
Locality Sensitive Hashing uses nonlinear hashing functions [27]. Kernelized ap¬ 
proaches have also been proposed |28l29l80j . Spectral Hashing, ordinary uses 
a uniform distribution to deriving analytical solution, and its precision is em¬ 
pirically lower compared to ITQj^Oj and other state-of-the-art methods for non- 
uniformly distributed data. To overcome this difficulty, a kernel based approaches 
are proposed JSj. But it is difficult to apply to high-dimensions. Recently pro¬ 
posed method called Spherical Hashing |31] is a example of non-kernelized non¬ 
linear method. Since its hashing function is hyper-sphere based, it also needs 
euclidean distance calculation for hashing. The computational cost is getting 
large for high-dimensional data. 


Recently, a bilinear hashing method, which is called BPBC, that is feasible 
in high-dimensions was proposed m- To our knowledge, this is the first binary 
hashing method that can treat lOK dimensions or higher. However, this method 
folds feature vectors and bilinearly rotates them in the folded space. It is unable 
to treat all of the Special Orthogonal group (Rotational group) SO{n). There is 
still no linear high dimensional binary hashing method that can directly treat 
SOin). 

In this paper, we propose a new highly efhcient linear binary hashing method. 
Our method is inspired by Isotropic Hashing [32] . We found out its natural exten¬ 
sion. First, we study the meaning of isotropic transformation analytically. Then 
we develop a efficient isotropic hashing algorithm and its extension using trade¬ 
off relationship between isotropy and entropy. Recently proposed Sparse Isotopic 
Hashing method|33j produces sparse rotational matrices that yield isotropic vari¬ 
ances; however the learning of high-dimensional rotational matrices is not feasible 
in practice. Our main contributions are, 


1: State-of-the-art computational cost and accuracy 

Our algorithm takes O(nlogn) encoding cost for n dimensional features. The 
previously known state-of-the-art method BPBC requires 0(v?/d+nd) (typically 
d = 128, with no dimension reduction case) cost. We show that the proposed 
algorithm is more accurate than BPBC. Moreover, it is remarkably faster in 
learning phase. The main cost consuming point of our algorithm is calculation 
of a variance-covariance matrix. We only need O(n^logn) computational cost 
in learning iteration loop, whereas BPBC requires O(n^logn) in each iteration 
step. Therefore it is practically faster than BPBC, although total computational 
cost of learning has the same order O(mn^) in our algorithm and BPBC with 
m training data size. 
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2: Analytical treatment of hashing criteria 

Typical criteria for measuring hashing performance are quantization error, 
variance of each bit, and entropy. To the author’s knowledge, analytical treat¬ 
ment of these criteria has not been well studied yet. We show an analytical 
result, and a set of algorithms derived naturally from the result. The analyt¬ 
ical calculation is mainly based on gaussian distribution. That does not mean 
the proposed algorithm is only applicable to gaussian-distributed data. Because 
any non-gaussian distributions can be expanded around gaussian [31] , assuming 
gaussian distribution means taking lowest order of expansion. As will be dis¬ 
cussed below, lowest order approximation leads enough hashing accuracy and 
yields extremely efficient algorithm. 

2 Theoretical Background 

2.1 Quantization Error for Binary Hashing 

Along with clustering methods such as k-means clustering, most binary hashing 
algorithms aim at minimizing quantization error between binarized codes and 
original feature vectors [20ll0j . In this study, therefore, the properties of quanti¬ 
zation error are first investigated analytically. The result will be used to develop 
the binary hashing algorithm we propose. 

Most of linear binary hashing methods consist of translation operation and 
linear transformation. 

b{x) =sgii{A{x-t)), (1) 

x,teR'^,b{x) e {-i,i}'",AgK™^”. 

It is assumed that a translation is mean centering in the following discussion. 

Quantization error is defined as the sum of squared Euclidean distance be¬ 
tween an original feature vector and its binarized vector. 

i 

where N is the number of data points. When the data is distributed as arbitrary 
distribution function p(x), it is possible to write down mean quantization error. 
If two-dimensional data are assumed, 
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Then it is generally calculated as follows; 

poo poo 

E^^°° = I dxi / dx2 \x\ + X2 + ‘2 \ p{x) 
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(4) 


where Pi{-) and P 2 {') are marginal distributions with respect to Xi and X 2 - 
If it is assumed that the distribution is gaussian with mean centering and 
variance-covariance matrix E, Q is calculated as follows, 

E»-“ = 2 + lV(r)-2y|(v^+v^), (5) 

where and ^/a^ are standard deviations of each dimension. These results 

are straightforwardly extended to higher dimensions case. 


2.2 Quantization Error Minimization in Gaussian Distribution with 
Rotational Transformation 

Some binary hashing algorithms use orthogonal transformation POTTU] . This 
means that their purpose is to find cost minimizing point in rotational group 
SO(n). We also consider rotational group in this paper. 

With rotational group transformation, minimizing ([^ is equivalent to max¬ 
imizing yjaii -\- yJ<J 22 subject to Tr(i7) = const. Since constancy of Tr(i7) con¬ 
strains a two dimensional vector (i/oqi, onto a circle, the solution is 

O'!! = ^ 22 - It is proved that the Isotropic Hashing [32] is quantization-error- 
minimizing hashing for gaussian distribution. We can see the isotropy as a mea¬ 
sure of quantization error. 

2.3 Entropy and Quantization Error 

Then the entropy of binary code is calculated. Here, eigenvalues and angle rep¬ 
resentation are used instead of a variance-covariance matrix, and only the two- 
dimensional case is treated. In this representation, elements of the variance- 
covariance matrix are described as follows; 

CTii = A-I-Acos20, (722 = A — A cos 20, CTi 2 = Asin20, (6) 

T _ Ai -I- A 2 , _ Ai — A 2 
“ 2 ’ “ 2 ’ 

where Ai and A 2 are eigenvalues of variance-covariance matrix (Ai > A 2 ). 9 
means the angle between xi-axis and the longer axis direction of the gaussian 
ellipse (Fig.[^. 
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Fig. 1. Schematic illnstration of two-dimensional ganssian distribntion. An ellipse de¬ 
scribes the shape of distribution. Left: eigenvalue and angle parameterization(Eq. §)• 
Ai and A 2 are the eigenvalues of variance-covariance matrix. Right: rotation angle for 
Isotropic or PC A transformation. 6^150 and Open are given in (H), @. 




Fig. 2. Quantization error and entropy of binary code for Ai = 2, A 2 = 1. The x-axis is 
angle 6 given in equation §. From the symmetry of gaussian distribution, it is enough 
to consider range 0 € [ 0 , 7 r/ 2 ] 


From the symmetry of gaussian distribution, it is enough to get probabilities 
of binary code (1,1) and (—1,1). It is possible to analytically calculate these 
probabilities as follows; 


P{hi) 



1 1 


ia;" i;- 


P(-l.l) 


■P(i.i) = ^tan 


7 sin 29 


11 / 2 \ 

2 27 t \ 7 sin 20 J 

0 e[o,^], ( 7 ) 


where 7 is defined as 7A1/A2 — 7'^2/Ai, which is the maximum value of cor¬ 
relation between xi and X2 under rotational transformation. The entropy of the 
two dimensional binary code is then given as 


'S'(7, fi*) = 2 X (-P(i.i) logP(i,i) - P(-i.i) logp(_i,i)). (8) 

Fig.j^is plots of quantization error and entropy with respect to angle 9 . When 
quantization error is minimized, entropy is also minimized and vice versa. This 
means that the quantization error and the entropy have a trade-off relationship. 
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Compatibility of the two factors depends on the ’’sharpness” of the distribution. 
When the distribution is sharp (Ai » A2), entropy is heavily damaged with 
isotropic variances. There is no trade-off relationship if the distribution is circular 
(Ai = A2). But in general case, we should balance these criteria. 

An analysis that is similar to ours is recently proposed in [^. In the paper, the 
quantization error for Product Quantization was discussed. The authors showed 
that it is bounded by determinant of variance-covariance matrix, and proposed 
an algorithm that is minimizing the bound under rotational transformation. The 
result indicates trade-off relationship between quantization error and entropy of 
the gaussian distribution. However, since the entropy of the gaussian distribu¬ 
tion is invariant under rotational transformation, this method only determines 
the partition of the entire space to the set of small subspaces. Rotations in each 
subspaces are not under discussion. By contrast, our analysis can consider ro¬ 
tational optimality in the two-dimensional subspaces because we investigate the 
entropy of binary codes directly. 

Another example is [ 3 S] . The authors proposed two criteria, that is the ” cross¬ 
ing sparse region” and the ’’balanced buckets”. The first criterion can be in¬ 
terpreted as quantization error minimization, and the second means quantizer 
entropy maximization. We think that it is possible to interpret many existing 
methods as such trade-off problem of quantization error minimization and en¬ 
tropy maximization. 


3 Methods 

A binary hashing algorithm based on the above-discussed theory is developed 
as follows. We are going to have very sparse transformation matrices, which 
substantially decrease encoding cost. 

3.1 Problem Statement 

The problem is to yield a linear transformation matrix A in equation 0. Most 
existing methods split the transformation into dimension reduction projection 
W G and transformation in reduced space Q G PCA is commonly 

used for reducing the number of dimension. However, a PCA transformation 
matrix is dense, it is difficult to get transformation and efficient encoding cal¬ 
culation in highly dimensional cases. In this paper, dimension reduction is not 
treated. It is thus assumed that the number of dimensions of the original feature 
vector and the encoded binary vector are the same, and A — Q G only is 

treated. Dimension reduction can be done in the similar way as we are going to 
discuss below, but detailed study is a future work. 

3.2 Sequential Pairwise Isotropic Rotation 

First, we derive transformation that makes variances completely isotropic. We 
can get very sparse isotropic transformation matrix with 0(nlog2 n) fill-ins using 
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basic isotropic rotation basic isotropic rotation basic isotropic rotation 



pairwise sorting pairwise sorting pairwise sorting 

rotation rotation rotation 



sorted index sorted index sorted index sorted index 


Fig. 3. Schematic illustration of basic isotropic rotations (in 8-dimensions). Upper: 
Structure of the transformation matrix. Non-zero matrix elements are filled with gray 
color. Sorting matrix is a permutation matrix, which sort variances in descending order. 
Rotation is done for pair of largest with smallest variance dimensions. Basic rotation is 
continuously applied log2 n times for n dimensions. Lower: Behavior of variances under 
continuons multiplication of basic isotropic rotation. The graphs are sorted variances 
nnder sequential multiplication. The rightmost graph is the (sorted) initial state. One 
basic isotropic rotation makes variances isotropic by pairs. And it is exponentially 
transformed to globally isotropic state with continuous application of basic rotation. 


pairwise isotropic rotation, although original Isotropic Hashing | 32 ] needs dense 
transformation matrix with O(n^) fill-ins. 

In two-dimensional space, there are only two isotropic transformations. It 
corresponds to 0 = 7r/4,37r/4 in equation ®. From the symmetry of the gaussian 
distribution, it is enough to consider 6 = 7r/4. For any two-dimensional variance- 
covariance matrix S, the rotation matrix that makes variances isotropic is 


R = 


f cos 0iso sin 0iso 
\^sin0iso cos0iso 


0 iso — tan 


1 (Til — 0-22 

2 (Ti2 


( 9 ) 


To develop isotropic transformation for full dimension, we define ’’basic isotropic 
rotation”. It consists of three steps. First step is to sort dimensions by diagonal 
elements of variance-covariance matrix in descending order. Second step is to 
create pairs of dimensions as ( 1 , n), ( 2 , n — 1 ), • • •. Third step is taking isotropic 
rotation ([^ for each pairs. The set of processes is denoted by a permutation 
(sorting) matrix and a rotational matrix with 2 n fill-ins. This transformation, 
which we call it ’’basic isotropic rotation”[^n what follows, make variances pair¬ 
wise isotropic. Then we apply above transformation sequentially. Applying the 
transformation two times make variances quadruple isotropic, three times makes 
then octuple isotropic (Fig. [^, and so on. Finally, applying the transformation 

^ When a permutation is odd, the determinant of the transformation matrix is - 1 . In 
that case, the transformation is not a element of SO{n), but 0 (n). However, we 
can always obtain the element of SO(n) by applying a odd permutation to the final 
matrix A. The application means a permutation of bits and does not affect retrieval 
results. 
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[log 2 n\ times, we have completely isotropic variances. [^Finally developed trans¬ 
formation is a product of sparse matrices. It has 2n|’log2 n\ fill-ins in total. This 
factorized form is highly sparse especially for high dimensions. It is possible to 
use standard sparse matrix data structure. Memory usage and computational 
cost can be substantially decreased. 

3.3 Trade-oflf between Quantization Error and Entropy 

A factorized sparse transformation that makes variances completely isotropic 
was obtained as described in the preceding section. However, in Section a 
trade-off relationship between isotropy (quantization error minimization) and 
entropy maximization was revealed. Since entropy reduction degrades retrieval 
accuracy, a balance between isotropy and entropy should be kept. Accordingly, 
two methods for such balancing are proposed hereafter. The first one is the 
simpler one and does not increase the number of fill-ins. The second one is using 
additional sparse rotation matrices. It increases the number of hll-ins, but has 
better accuracy than the first one in some cases. 


PCA tilting (PCAT) In the first method, each pairwise rotation is ’’tilted” 
from the isotropic angle to the PCA angle (Fig. [^. It corresponds to 0 = 0 in 
equation (§. Entropy is increased with this tilting, since the PCA angle is the 
entropy-maximizing angle. Rotation matrix is derived as 


_ /cos 0(A) — sin 0(A)\ 
sin 0(A) cos 0(A) j ’ 

0(A) = 0iso + A(0pca - 0iso), 0pca = tan"! - ] , (10) 

\Z (Til — 0'22/ 

where 0iso is given in Eq. 

A is a tuning parameter ranging from zero (completely isotropic) to one 
(completely PCA). We can control a degree of balance between isotropy and 
entropy by tuning A. Since PCA tilting does not lead to completely isotropic 
variances, there is no definite reason to stop applying basic rotations at log 2 n 
times. However, it is not necessary increase the number of basic rotations because 
it practically leads enough accuracy with log 2 n times application. The number 
of fill-ins of the transformation matrix therefore does not need to change. 


Random Sparse PCA Rotation (RSPCA) The second method applies 
additional sparse rotations after having a completely isotropic transformation. 
The additional matrices have the same form as basic isotropic rotation, but n/2 

^ In a precise sense, completely isotropic variances can be obtained for only 2” dimen¬ 
sional case. For other dimensions, it needs infinite number of basic isotropic rotation. 
In practice, however, enough sub-isotropic variances can be obtained with [logj n] 
times transformation. 
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rotational pairs are randomly chosen and PCA rotation {9pca given in Eq. @) 
is applied to each pair. This procedure is called ’’basic PCA rotation”. Although 
it is not obvious how many times the basic PCA rotation should be applied, the 
experiments discussed in succeeding section show that 0 (log 2 n) times rotations 
attains maximal retrieval accuracy. So the increasing of the number of fill-ins of 
the transformation matrix is very little. 

3.4 Relation with Major Existing Strategies 

The proposed algorithm introduces a novel strategy, in which the transformation 
matrix is expressed as a factored form of pairwise rotational matrices. For con¬ 
structing each rotation, only the variance-covariance matrix is used. In contrast, 
some existing linear binary hashing algorithms (such as ITQ) use an objective 
function that is directly calculated from the data (e.g, quantization error due to 
discretization). These data-dependent objective functions capture non-gaussian 
property of the distribution of data. 

On the other hand, an arbitrary probability distribution function has an ex¬ 
pansion series with the lowest order term given by a gaussian distribution. Such 
expansion is called Edgeworth expansion |34j . From this viewpoint, it can be re¬ 
garded as the lowest-order approximation is taken in our algorithm, whereas ITQ 
and other data-dependent methods consider higher order non-gaussian terms. 
Omission of higher order terms enables analytical treatments, which can provide 
a simple and computationally efficient binarization procedure. Despite the fact 
that the higher order terms are disregarded, the proposed method still achieves 
considerably high accuracy as explained below. 

4 Experiments 

In the experiments, 128-dimensional gaussian toy data, 128-dimensional SIFT 
data, and high-dimensional VLAD data with various dimensions are used. The 
gaussian data is used for evaluating the theoretical behavior of the proposed 
algorithm. The SIFT data is used for comparing existing methods that is not 
feasible in high dimensions. The VLAD data is used for evaluating the algorithm 
in comparison with the state-of-the-art high-dimensional method. 

4.1 Experimental Protocols 

Settings We use Top-10 recall as performance measure of binary hashing. Eu¬ 
clidean nearest neighbors in original feature space is used as ground truth. For 
the gaussian data, lOK data points for training, 2K for query and lOOK for 
database is used. For SIFT data, we use SIFTIM dataset [8] and obey the orig¬ 
inal protocol (lOOK training set, lOK query set, and IM database set). For 
creating VLAD data, we use ILSVRC2010 dataset [3B]. 25600-dimensional and 
64000-dimensional VLAD is calculated from original SIFT data. 20K points for 
training and 5K points for queries are then randomly picked. The rest of the 
dataset (about IM points) is used for the database. 
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Existing methods to be compared We choose counterpart methods as fol¬ 
lows: Sparse Random Rotation (SRR): This is a random method corre¬ 
sponding to our sequence of sparse matrices scheme. The transformation matrix 
for SRR has the same form as our method (Fig. [^, whereas there is no sorting 
and rotation angle for each pair is randomly chosen. The number of basic rotation 
applied is set to [log 2 n]. Iterative Quantization (ITQ)[20]: This is one of 
the most well-known methods that keeps nearly-state-of-the-art performance for 
a wide range of data. It is considered as a reasonable performance counterpart 
for low-dimensional case. Isotropic Hashing (ISO) [52]: This is the original 
method that generates orthogonal transformation to make variances completely 
isotropic. For high-dimensional case (d > 3), there are generally an infinite num¬ 
ber of isotropic states for any variance-covariance matrix. Each isotropic state 
has different retrieval performance because it differs from others in terms of 
entropy and higher-order cumulants (non-gaussian effects). It is considered as 
a counterpart that measures the quality of our isotropic transformation. We 
use Lift-and-Projection optimization algorithm proposed in (52). PC A Hash¬ 
ing (PCA): PCA hashing, as its name suggests, uses linear transformation to 
PCA basis. As discussed in section [^3] PCA basis is the opposite extreme of 
the isotropic basis with regard to the trade-off relationship between quantization 
error and entropy. K-means Hashing (KMH)[Tn|: This is a recently proposed 
state-of-the-art method. It uses k-means Vector Quantization and binary code 
assignment optimization for each cluster center. It is thus a kind of nonlinear 
method. It is selected for evaluating binary hashing performance compared to 
nonlinear methods. We use algorithm parameter b = 4, M = ndim/6 and 50 
iteration number defined in [12]. Bilinear Projection-based Binary Codes 
(BPBC)[in]: This is the state-of-the-art high-dimensional hashing method using 
bilinear transformation. It is considered the baseline method. We use algorithm 
parameter di = 128, (i 2 = ndim/di, and 50 iteration number. 

4.2 Toy-data Experiment 

First we use artificial gaussian data to observe theoretical behavior of the pro¬ 
posed algortihm, which is discussed above. 

A 128-dimensional random variance-covariance matrix is created and used 
to generate mean-centered gaussian data. To create a variance-covariance ma¬ 
trix, a diagonal matrix with random positive eigenvalues that is distributed 
log-normally is generated. Then a diagonal matrix is rotated by random rota¬ 
tion. We consider two different eigenvalue distributions. One uses a log-normal 
distribution with log variance of one (sphere-like distribution), and the other 
uses a log-normal distribution with log variance of three (sharp distribution). 

Fig. I^shows the retrieval results. In the case of a sphere-like distribution (up¬ 
per row), most of methods have little difference in accuracy because the shape of 
the distribution is insusceptible under rotational transformation. A notable point 
is that in the case of sharp distribution (lower row), completely isotropic PRH is 
obviously inferior to the other cases, although Isotropic Hashing, which also has 
completely isotropic variances, achieves reasonable performance. As discussed 
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number of retrieved points number of retrieved points number of retrieved points 


(a) sphere-like gaussian. 
comparison with existing 
methods. 



(b) sphere-like gaussian. 
effect of PCAT. 


(c) sphere-like gaussian. 
effect of RSPCA. 



number of retrieved points number of retrieved points 


(d) sharp gaussian. (e) sharp gaussian. effect of (f) sharp gaussian. effect of 
comparison with existing PCAT. RSPCA. 

methods. 


Fig. 4. Top-10 NN retrieval results for 128-dimensional gaussian data. PRH(m, n, A) 
indicates the proposed method with m-times basic isotropic rotation, n-times basic 
PCA rotation, and PCAT parameter A (Eq. (101). Upper row is for the data with log- 
variance value of one. Lower row is for the data with log-variance of three (described 
in Section 4.21. Abbreviated legends of plot (d) are the same as that of plot (a). 


in Section |4.1[ there are an infinite number of isotropic states. The Lift-and- 
Projection in Isotropic Hashing tends to find entropically favorable isotropic 
states. Despite the fact that PRH is extremely simple and sparse, it sometimes 
achieves entropically inferior isotropic states. However, this inferiority is reason¬ 
ably overcome by PCAT or RSPCA without loss of sparsity. 

The lower middle plot of Fig. [^indicates that almost-PCA angle (A ~ 1) at 
each pairwise rotation leads to good performance in the sharp gaussian distribu¬ 
tion. It is important to distinguish our sequential pairwise almost-PCA rotation 
and PCA hashing rotation. To obtain exact PCA basis, it is necessary to account 
for all n{n — l)/2 possible pairs. PCAT, however, only deals with 0(nlog2n) 
pairs. 


4.3 Real Datasets 
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Fig. 5. Top-10 NN retrieval results of 128-dimensional SIFTIM data. The meaning of 
PRH(m, n, A) is shown in Figure]^ 
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(b) 32bit (1/4 dimension 
reduction) results. 


(c) PRHs for different 
degree of reductions. 


Fig. 6 . Top-10 NN retrieval results of SIFTIM data with PCA dimension reduction. 
Left:64bit (1/2 reduction) case. Middle: 32bit (1/4 reduction) case. Right: behavior 
of PRHs across some dimensions. The meaning of PRH(m, n, A) is shown in Figure]^ 


Low Dimensional Case The SIFTIM case is considered next. Fig. [^is the 
retrieval results. In common with the gaussian case, the completely isotropic 
PRH (PRH(7, 0, 0.0)) leads unfavorable accuracy. PCAT and RSPCA attain 
good performance. Especially, RSPCA achieves remarkably better retrieval re¬ 
sult compared to other methods in figure (d). 


Relation to dimension reduction Although a sparse dimension reduction 
scheme is not devised in this study, we examine the effect of dimension reduction 
on the performance of the proposed algorithm. As with existing methods, PCA 
basis is tentatively used as dimension reduction for PRH and SRR. Note that 
it does not keep sparsity of transformation. Fig shows the results of SIFTIM. 
It is clear that the proposed algorithm maintains higher performance with each 
dimension reduction compared to the other methods. 
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(a) comparison with (b) effect of PCAT. (c) effect of RSPCA. 

baseline method. 



(d) comparison with (e) effect of PCAT. (f) effect of RSPCA. 

baseline method. 


Fig. 7. Top-10 NN retrieval results for 64000 (Upper) and 25600-dimensional (Lower) 
VLAD data. The meaning of PRH(m, n, A) is shown in Figure]^ 


High-dimensional Case The high-dimensional case, which is the main con¬ 
tribution of our algorithm, is examined next. Fig. is the retrieval results 
for 64000-dimensional and 25600-dimensional VLAD features calculated from 
ILSVRC2010 dataset. PCAT achieves state-of-the-art retrieval accuracy. It is 
verified that it attains high performance for each dimension. In the experiment, 
contrary to the lower-dimensional case, RSPCA is inferior to PCAT. We think 
that this is the result of improper random pairing in RSPCA because the possible 
number of pairing is 0{n?). 

4.4 Computational Cost 

Since our implementation is using a MATLAB sparse matrix datatype, it is 
difficult to reasonably evaluate encoding cost in comparison with other methods 
that use optimized dense matrix operations. We use tentative environment for 
the evaluation. We show the comparison of the number of product operations and 
speed improvement ratio to BPBC in encoding phase (Fig.[^. The improvement 
ratio is calculated with naive C implementation of dense/sparse matrix operation 
and it is compared with the theoretical (the number of product operations) 
one. The number of sum operation is also cut down in our method and we 
only need one sum operation per two product operations. BPBC needs almost 




































14 


Pairwise Rotation Hashing for High-dimensional Features 


Table 1. Learning time for each methods. Left: Learning time for SIFTIM dataset. 
Right: Learning time for 25600-dimensional VLAD case. 


Methods 

Learn t(s) 

PRH(7,0,0.5) 

0.11 

PRH(7,7,0.0) 

0.10 

SRR 

0.015 

ITQ 

12.0 

ISO 

3.90 

PCA 

0.09 

KMH 

402 



Methods 

Learn t(s) 

PRH(15,0,0.3) 

344 

PRH(15,15,0.0) 

527 

SRR 

24 

BPBC 

1740 



Fig. 8. Encoding cost. Left: Comparison of the number of product operations. Right: 
Encoding speed improvement ratio of PRH in comparison with BPBC. 


the same number of sum and product operations. This attributes would be an 
explanation of the exceeding of naive C implementation result to theoretical 
speed improvement ratio. 

Table. is the learning time comparison. PRH learns very fast in each case. 
For high dimensional case, 25600-dimensional learning time is shown to store all 
of the data on memory. Our implementation is not optimized (efficient treatment 
of sparse and symmetric matrix is possible). 


5 Conclusion 


We have proposed Pairwise Rotation Hashing (PRH), a linear binary hashing 
algorithm that has O(nlogn) encoding cost. PRH is based on two-dimensional 
analytical study of trade-off relationship between quantization error and entropy. 
The proposed algorithm is also fast in the learning phase because it needs only 
O(nlogn) computations in the iteration loop. It shows high hashing accuracy in 
retrieval tasks at both low and high dimensions. Especially it achieves state-of- 
the-art performance at high dimensions (lOK or higher). 

We still have room for improvement. In this study, a dimension reduction 
scheme compatible with pairwise concept is excluded. We have an idea that di¬ 
mension reduction can be done again in a pairwise fashion, i.e, droping minor 


components in the pairwise PCA in Eq. 10 A key issue is to find an appropriate 
pairing method in the pairwise PCA part. Even though RSPCA demonstrated 
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high performance, it can be further improved if components are paired in se¬ 
lective ways, rather than in random ways. In RSPCA, there is the potential for 
finding more sophisticated pairing scheme that favorably balances isotropy with 
entropy. However, the exhaustive search of O(n^) possible pairing substantially 
degrades the learning speed. Non-random but efhcient pairing scheme is needed. 
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